tinyForestR
packagetinyForestR package?The aim of tinyForestR is to provide set of tools
designed to extract, manipulate and analyse data relevant to the
location of Tiny Forests in the UK.
Specifically it extracts and processes landcover and biodiversity data from a range of sources for a given area around Tiny Forest locations, and provides a set of tools for analysing citizen science data derived directly from Tiny Forests.
Tiny Forests are small scale dense tree plantings usually in urban areas - and are managed by Earthwatch. https://earthwatch.org.uk/get-involved/tiny-forest
The package is hosted on Github and is a work in progress. It can be
installed by running
devtools::install_github("julianflowers/tinyForestR"). It
loads another Github package myScrapers which is needed to
download data from the Tiny Forest website.
The package makes use of a number of Application Programming Interfaces (APIs) some of which require API keys which will need to be applied for separately. This is outlined in the relevant sections of this vignette.
It uses a range of Python packages to access some datasets (in some
cases Python packages are better developed than R). For this reason the
first step is to run initialise_tf() to intialise the
package.
This:
Loads the reticulate package which R uses to talk to
Python
Sets up a Python virtual environment (VE) called
tinyforest, and ensures R uses the correct version of
Python by specifying the RETICULATE_PYTHON environment
variable. It may be necessary to restart R to make sure R correctly uses
this.
You will be asked if you want to remove the
tinyforest environment. Say “no” unless you have a previous
installation and are having trouble. It will create a clean install of
the VE.
Installs a number of Python packages to the
tinyforest environment. These include
earthengine-api which enables access to Google Earth
Engine (GEE)
geemap - a set of added value tools to extract and
manipulate GEE data
osdatahub - access to Ordnance Survey National
Geographic Database data.
OSGridConverter - converts lat-longs to UK Grid
references
Imports the relevant modules for use in other packages
The next step is to load Tiny Forest (TF) data. Because this only
exists in a series of web pages the get_tf_data function
identifies the relevant pages and iterates over them to extract name,
id, location, area, planters, and types of tree planted (as a list
column), for those TFs planted at the time of extraction. It does
include TFs planted outside the UK. The function takes about 120 seconds
to iterate over all the relevant pages.
get_tf_data does 4 things:
extracts a data frame Tiny Forest names and IDs
extracts a data frame of additional TF details (tree species, areas and locations)
extracts and sf (spatial) data frame of UK data
locations
produces and interactive map of TF locations
Producing a simple map is as simple as loading
tf_data$map.
Or we can draw a more sophisticated static map using
tf_data$tidy_tf_sf.
## load UK boundary files
uk <- read_sf("https://services1.arcgis.com/ESMARspQHYMw9BZ9/arcgis/rest/services/Countries_December_2022_UK_BGC/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson")
ggplot() +
geom_sf(data = uk, fill = "white") +
geom_sf(data = tf_data$tidy_tf_sf, pch = 1, aes(colour = factor(year(plant_date)))) +
labs(title = "Tiny Forest locations 2024",
subtitle = paste("N = ", nrow(tf_data$tidy_tf_sf))) +
theme_void() +
scale_colour_viridis_d(option = "turbo")It is simple to draw time series plots…
df <- tf_data$tidy_tf_sf
df |>
count(year = year(plant_date)) |>
ggplot() +
geom_col(aes(year, n)) +
geom_label(aes(year, n, label = n)) +
labs(title = "TFs planted per year") +
theme_minimal() +
theme(plot.title.position = "plot")Once the data is loaded we can save it as a csv file and get some high level information on planting, timings, size and so on.
needs(patchwork)
## annual planting
tf_trees <- df |>
mutate(trees_1 = str_split(trees, "\\|")) |>
select(-trees) |>
unnest("trees_1")
tf_trees |>
group_by(tf_id) |>
summarise(n = n()) |>
ggplot(aes(n)) +
geom_histogram() +
labs(title = "Distribution of tree species",
x = "Number of tree species per TF") +
theme_minimal() +
theme(plot.title.position = "plot")We can also look at planting frequency for different tree species.
tf_trees |>
ungroup() |>
count(trees_1, sort = TRUE) |>
top_n(25, n) |>
ggplot() +
geom_col(aes(n, reorder(trees_1, n))) +
labs(y = "",
title = "Most commonly planted tree species",
x = "Number of Tiny Forests") +
# ggthemes::theme_base() +
theme(plot.title.position = "plot") +
theme_minimal() +
theme(plot.title.position = "plot")We can also download an image for a TF via Google Maps. This requires an API key which can be obtained as follows:
https://developers.google.com/maps/documentation/streetview/cloud-setup
library(ggmap);library(magick)
tf_df <- tf_data$tidy_tf
api_key <- Sys.getenv("GOOGLE_MAPS")
image <- tinyForestR::get_tf_images(lon = tf_df$lon[100], lat = tf_df$lat[100], key = api_key, zoom = 20, tf_id = tf_df$tf_id[100])
image_read(image )The get_nbn_buffer downloads occurrence data from the
NBN Atlas in a set buffer around a given longitude and latitude. For
example we can download 30000 records around (Darlington TF). Note: when
extracting data from APIs it helps to create a “safe” version of your
function espacially if you aare making multiple API calls (see below).
The makes sure that your function will keep on running. This is achieved
by using the safely function which stores the output as a
list with 2 variables - result and error. To
get the results we need to extract the result vector.
safe_buff <- safely(tinyfoRest::get_nbn_buffer, otherwise = NA_real_)
nbn_data <- safe_buff(tf_df$lon[100], tf_df$lat[100], n = 30000, tf_id = tf_df$tf_id[100])
#> 1.435 sec elapsed
nbn_out <- nbn_data$result |>
mutate( tf_id = tf_df$tf_id[100])
nbn_out |>
filter(year > 2014) |>
st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = 4326) |>
ggplot() +
geom_sf(aes(colour = phylum))For ecological analyses we need to transform the data from long to
wide formats. With an NBN atlas extract we can deploy the
create_species_matrix function we creates a species matrix
and a list of sites or observation ids.
For example we can extract a matrix of all insect observations…
needs(tinyForestR)
## create a matrix of observations by class and year
spp_matrix <- create_species_matrix(nbn_out, class = "Insecta")
tf_bd_insects <- calc_bd_metrics(df = nbn_out, class = "Insecta")
tf_bd_birds <- calc_bd_metrics(df = nbn_out, class = "Aves")
tf_bd_mammals <- calc_bd_metrics(df = nbn_out, class = "Mammalia")
tf_bd_mammals$metrics
#> month richness N ratio diversity
#> 1 03 1 5 0.2000000 0.0000000
#> 2 04 1 6 0.1666667 0.0000000
#> 3 05 6 10 0.6000000 0.7800000
#> 4 06 5 15 0.3333333 0.7822222
#> 5 08 5 6 0.8333333 0.7777778
#> 6 09 1 4 0.2500000 0.0000000
#> 7 10 1 2 0.5000000 0.0000000
#> 8 <NA> 1 2 0.5000000 0.0000000To compare biodiversity across TF areas we can extract NBN data from
multiple sites using the purrr package. Set
.progress = TRUE to track the time it takes to run. Note:
the NBN atlas API has a 500000 download limit, but it sometimes a bit
temperamental.
## sample tf_ids
##
sample <- sample_n(tf_df, 20)
output <- map(1:nrow(sample), \(x) safe_buff(lon = sample$lon[x], lat = sample$lat[x], tf_id = sample$tf_id[x], n = 30000), .progress = TRUE)
#> 0.154 sec elapsed
#> 5.448 sec elapsed
#> 3.593 sec elapsed
#> 9.58 sec elapsed
#> 1.997 sec elapsed
#> 2.462 sec elapsed
#> 5.09 sec elapsed
#> 5.501 sec elapsed
#> 4.279 sec elapsed
#> 17.907 sec elapsed
#> 4.786 sec elapsed
#> 19.061 sec elapsed
#> 1.062 sec elapsed
#> 1.117 sec elapsed
#> 11.114 sec elapsed
#> 0.259 sec elapsed
#> 3.96 sec elapsed
#> 1.631 sec elapsed
#> 5.553 sec elapsed
#> 4.237 sec elapsed
res <- map(output, "result")
res_1 <- res[which(!is.na(res))]
res_df <- list_rbind(res_1)How many observations per site per year?
We have 189205 rows of data for 0 sites.
sites <- res_df[["site"]] |> unique()
res_counts <- res_df |>
filter(between(year, 2010, 2019)) |>
count(site, year)
res_counts |>
ggplot() +
geom_line(aes(year, n, group = site), colour = "grey90") +
geom_line(aes(year, n, group = site), colour = "red", data = res_counts |> filter(site == sites[2])) +
labs( y. = "No of observations",
caption = "Source: NNB API",
x = "Year",
title = "Annual NBN observation counts per TF site",
subtitle = paste0("Site ", res_counts |> filter(site == sites[2]) |> select(site) |> slice(1), " shown as red line")) +
theme_minimal() +
theme(plot.title.position = "plot") +
scale_x_continuous(n.breaks = 10)
res_counts_site <- res_df |>
filter(between(year, 2010, 2019),
site == sites[2]) |>
count(site, classs, year) |>
filter(n > 9) |>
ggplot(aes(factor(year), n)) +
geom_col() +
facet_wrap(~ classs, scales = "free") +
scale_x_discrete()
res_counts_siteFor some taxa alternative data sources may add additional value. For example data from the 2020 Botanic Society of Britain and Ireland survey may give a more complete plant record for each 1km grid square (see below); and eBIrd data may be more upto data for bird observations.
eBIrd data can be obtained by creating and account on the eBIrd website and completing a request form specifying location and requirements. Once approved the data is made available as a download.
I have also included functions to extract data for the 2020 Botanic Society of Britain and Ireland survey. This is publicly available from for UK National Grid 1k monads. This requires conversion of lat-longs to UK grids.
grid_ref <- tinyForestR::os_lat_lon_to_grid(lon = sample$lon[2], lat = sample$lat[2])
#> Using virtual environment 'tinyforest' ...
grid_ref$grid
#> [1] "SK5606"bsbi_data <- tinyForestR::get_bsbi_data(grid_ref = grid_ref$grid)
bsbi_data <- bsbi_data |>
enframe() |>
unnest("value") |>
unnest("value") |>
mutate(year = str_extract(value, "20\\d{2}"),
value = str_remove(value, year),
count = parse_number(value),
value = str_remove(value, as.character(count)),
value = str_remove(value, "\\d{1,}"),
grid = grid_ref$grid,
tf_id = sample$site[2],
species = str_remove(value, "\\(") ,
species = str_remove(species, "\\)"),
species = str_squish(species)) |>
arrange(value) |>
drop_na()The calc_bd_metrics function takes an output from
get_nbn_buffer or get_bsbi_data, converts the
data from long to wide format, creates a species matrix for a specified
class (for get_nbn_buffer data), and outputs a list
containing:
| month | richness | N | ratio | diversity |
|---|---|---|---|---|
| 01 | 15 | 15 | 1.0000000 | 0.9333333 |
| 03 | 8 | 8 | 1.0000000 | 0.8750000 |
| 05 | 37 | 105 | 0.3523810 | 0.9668934 |
| 06 | 36 | 74 | 0.4864865 | 0.9667641 |
| 08 | 7 | 7 | 1.0000000 | 0.8571429 |
| 10 | 1 | 1 | 1.0000000 | 0.0000000 |
| 11 | 1 | 1 | 1.0000000 | 0.0000000 |
metrics$plot +
labs(title = "Monthly species richness for ",
subtitle = paste("Aves", tf_df$site[1]),
y = "Richness",
x = "Month") +
theme(plot.title.position = "plot")The package includes a calc_ndvi_buff function to enable
the calculation of normalized vegetation index (NDVI) for the buffer
area around a given point. It uses Sentinel2 surface reflectance
satellite images which are available at 10m resolution and are regularly
updated. The function extracts images via the Google Earth Engine API
and requires registration and authentication prior to use (see…).
The function returns a list including, image dates, NDVI statistics for the image, an interactive map and a raster. Note it may take few minutes to run.
The code chunk below calculates the NDVI for each image containing the buffer around a TF for 2019 and 2022 and maps them side-by-side. (Note, the function selects only those S2 images with cloud cover < 10%).
plant_date <- sample$plant_date |>
unique()
ee <- import("ee")
geemap <- import("geemap")
ee$Initialize()
i <- 10
after <- get_vegetation_indices(tf_id = sites[i], buffer = 30, lon = sample$lon[i], lat = sample$lat[i], start = as.character(sample$plant_date[i]), end = as.character(today() ))
#> Generating URL ...
#> Downloading data from https://earthengine.googleapis.com/v1/projects/earthengine-legacy/thumbnails/913fc84489280432b0e0b7c0fb25d432-4a4fc643476eba97e2ce07f839807db9:getPixels
#> Please wait ...
#> Data downloaded to /var/folders/bk/jrqs03tx5mq9s28mhml5xzhm0000gn/T/RtmpEnexwO/s2.tif
before <- get_vegetation_indices(tf_id = sites[i], buffer = 30, lon = sample$lon[i], lat = sample$lat[i], end = as.character(sample$plant_date[i]), start = as.character(sample$plant_date[i] - 365))
#> Generating URL ...
#> Downloading data from https://earthengine.googleapis.com/v1/projects/earthengine-legacy/thumbnails/e31c8c6d6f32586f6ef09e805e321a11-04eb74115a28c8845c093ad93a83b326:getPixels
#> Please wait ...
#> Data downloaded to /var/folders/bk/jrqs03tx5mq9s28mhml5xzhm0000gn/T/RtmpEnexwO/s2.tif
before$sf |>
ggplot() +
geom_density(aes(GLI)) +
geom_density(aes(GLI), data = after$sf, colour = "green") +
labs(title = paste(sample$plant_date[i], sites[i])) +
theme_minimal()